25 research outputs found

    Efficient cube construction for smart city data

    Get PDF
    To deliver powerful smart city environments, there is a requirement to analyse web produced data streams in close to real time so that city planners can employ up to date predictive models in both short and long term planning. Data cubes, fused from multiple sources provide a popular input to predictive models. A key component in this infrastructure is an efficient mechanism for transforming web data (XML or JSON) into multi-dimensional cubes. In our research, we have developed a framework for efficient transformation of XML data from multiple smart city services into DWARF cubes using a NoSQL storage engine. Our evaluation shows a high level of performance when compared to other approaches and thus, provides a platform for predictive models in a smart city environment

    Constructing data marts from web sources using a graph common model

    Get PDF
    At a time when humans and devices are generating more information than ever, activities such as data mining and machine learning become crucial. These activities enable us to understand and interpret the information we have and predict, or better prepare ourselves for, future events. However, activities such as data mining cannot be performed without a layer of data management to clean, integrate, process and make available the necessary datasets. To that extent, large and costly data flow processes such as Extract-Transform-Load are necessary to extract from disparate information sources to generate ready-for-analyses datasets. These datasets are generally in the form of multi-dimensional cubes from which different data views can be extracted for the purpose of different analyses. The process of creating a multi-dimensional cube from integrated data sources is significant. In this research, we present a methodology to generate these cubes automatically or in some cases, close to automatic, requiring very little user interaction. A construct called a StarGraph acts as a canonical model for our system, to which imported data sources are transformed. An ontology-driven process controls the integration of StarGraph schemas and simple OLAP style functions generate the cubes or datasets. An extensive evaluation is carried out using a large number of agri data sources with user-defined case studies to identify sources for integration and the types of analyses required for the final data cubes

    Identifying extra-terrestrial intelligence using machine learning

    Get PDF
    Since the date of establishment of the SETI Institute, its scientists have used various approaches in their search for extra-terrestrial intelligence (SETI). A novel idea involved image categorisation techniques in classifying radio signals represented by 2D spectrograms. The dataset of simulated radio signals, created for classification purposes have been used in this work to train models based on neural network architectures. It is shown in this paper that combining three different models, trained on features obtained by various techniques, has a positive impact on model accuracy and performance. Features learned by a convolutional neural network (CNN), bottleneck features from existing models and manually extracted features from the spectrograms comprised the three feature sets used as training data for the combined model. It was also shown that combining different methods of spectrogram generation resulted in improving the accuracy of the final model

    An ostensive information architecture to enhance semantic interoperability for healthcare information systems

    Get PDF
    Semantic interoperability establishes intercommunications and enables data sharing across disparate systems. In this study, we propose an ostensive information architecture for healthcare information systems to decrease ambiguity caused by using signs in different contexts for different purposes. The ostensive information architecture adopts a consensus-based approach initiated from the perspective of information systems re-design and can be applied to other domains where information exchange is required between heterogeneous systems. Driven by the issues in FHIR (Fast Health Interoperability Resources) implementation, an ostensive approach that supplements the current lexical approach in semantic exchange is proposed. A Semantic Engine with an FHIR knowledge graph as the core is constructed using Neo4j to provide semantic interpretation and examples. The MIMIC III (Medical Information Mart for Intensive Care) datasets and diabetes datasets have been employed to demonstrate the effectiveness of the proposed information architecture. We further discuss the benefits of the separation of semantic interpretation and data storage from the perspective of information system design, and the semantic reasoning towards patient-centric care underpinned by the Semantic Engine

    The L2L System for Second Language Learning Using Visualised Zoom Calls Among Students

    Get PDF
    An important part of second language learning is conversation which is best practised with speakers whose native language is the language being learned. We facilitate this by pairing students from different countries learning each others' native language. Mixed groups of students have Zoom calls, half in one language and half in the other, in order to practice and improve their conversation skills. We use Zoom video recordings with audio transcripts enabled which generates recognised speech from which we extract timestamped utterances and calculate and visualise conversation metrics on a dashboard. A timeline highlights each utterance, colour coded per student, with links to the video in a playback window. L2L was deployed for a semester and recorded almost 250 hours of zoom meetings. The conversation metrics visualised on the dashboard are a beneficial asset for both students and lecturers.Comment: 16th European Conference on Technology-Enhanced Learning (EC-TEL), Bozen-Bolzano, Italy (online), September 202

    Identification of movement categories and associated velocity thresholds for elite Gaelic football and hurling referees

    Get PDF
    The purpose of this study was to generate movement category velocity thresholds for elite Gaelic football (GF) and hurling referees using a two-stage unsupervised clustering technique. Activity data from 41 GF and 38 hurling referees was collected using global positioning system technology during 338 and 221 competitive games, respectively. The elbow method was used in stage one to identify the number of movement categories in the datasets. In stage two, the respective velocity thresholds for each category were identified using spectral clustering. The efficacy of these thresholds was examined using a regression analysis performed between the median of each of the velocity thresholds and the raw velocity data. Five velocity thresholds were identified for both GF and hurling referees (mean ± standard deviation: GF referees; 0.70±0.09, 1.66±0.19, 3.28±0.41, 4.87±0.61, 6.49±0.50 m·s−1; hurling referees; 0.69±0.11, 1.60±0.25, 3.09±0.52, 4.63±0.58, 6.35±0.43 m·s−1). With the exception of the lowest velocity threshold, all other thresholds were significantly higher for GF referees. The newly generated velocity thresholds were more strongly associated with the raw velocity data than traditional generic categories. The provision of unique velocity thresholds will allow applied practitioners to better quantify the activity profile of elite GF and hurling referees during training and competition

    sage-based summaries of learning videos

    Get PDF
    Much of the delivery of University education is now by synchronous or asynchronous video. For students, one of the challenges is managing the sheer volume of such video material as video presentations of taught material are difficult to abbreviate and summarise because they do not have highlights which stand out. Apart from video bookmarks there are no tools available to determine which parts of video content should be replayed at revision time or just before examinations. We have developed and deployed a digital library for managing video learning material which has many dozens of hours of short-form video content from a range of taught courses for hundreds of students at undergraduate level. Through a web browser we allow students to access and play these videos and we log their anonymised playback usage. From these logs we score to each segment of each video based on the amount of playback it receives from across all students, whether the segment has been re-wound and re-played in the same student session, whether the on-screen window is the window in focus on the student's desktop/laptop, and speed of playback. We also incorporate negative scoring if a video segment is skipped or fast-forward, and overarching all this we include a decay function based on recency of playback, so the most recent days of playback contribute more to the video segment scores. For each video in the library we present a usage-based graph which allows students to see which parts of each video attract the most playback from their peers, which helps them select material at revision time. Usage of the system is fully anonymised and GDPR-compliant

    Automating data mart construction from semi-structured data sources

    Get PDF
    The global food and agricultural industry has a total market value of USD 8 trillion in 2016, and decision makers in the Agri sector require appropriate tools and up-to-date information to make predictions across a range of products and areas. Traditionally, these requirements are met with information processed into a data warehouse and data marts constructed for analyses. Increasingly however, data is coming from outside the enterprise and often in unprocessed forms. As these sources are outside the control of companies, they are prone to change and new sources may appear. In these cases, the process of accommodating these sources can be costly and very time consuming. To automate this process, what is required is a sufficiently robust Extract-Transform-Load (ETL) process; external sources are mapped to some form of ontology, and an integration process to merge the specific data sources. In this paper, we present an approach to automating the integration of data sources in an Agri environment, where new sources are examined before an attempt to merge them with existing data marts. Our validation uses three separate case studies of real world data to demonstrate the robustness of our approach and the efficiency of materialising data mart

    A methodology for validating diversity in synthetic time series generation

    Get PDF
    In order for researchers to deliver robust evaluations of time series models, it often requires high volumes of data to ensure the appropriate level of rigor in testing. However, for many researchers, the lack of time series presents a barrier to a deeper evaluation. While researchers have developed and used synthetic datasets, the development of this data requires a methodological approach to testing the entire dataset against a set of metrics which capture the diversity of the dataset. Unless researchers are confident that their test datasets display a broad set of time series characteristics, it may favor one type of predictive model over another. This can have the effect of undermining the evaluation of new predictive methods. In this paper, we present a new approach to generating and evaluating a high number of time series data. The construction algorithm and validation framework are described in detail, together with an analysis of the level of diversity present in the synthetic dataset
    corecore